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Abstract 


Background 


More and more herbaria are digitising their collections. Images of specimens are made 
available online to facilitate access to them and allow extraction of information from them. 
Transcription of the data written on specimens is critical for general discoverability and 
enables incorporation into large aggregated research datasets. Different methods, such as 
crowdsourcing and artificial intelligence, are being developed to optimise transcription, but 
herbarium specimens pose difficulties in data extraction for many reasons. 


© Dillen M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 
4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are 
credited. 
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New information 


To provide developers of transcription methods with a means of optimisation, we have 
compiled a benchmark dataset of 1,800 herbarium specimen images with corresponding 
transcribed data. These images originate from nine different collections and include 
specimens that reflect the multiple potential obstacles that transcription methods may 
encounter, such as differences in language, text format (printed or handwritten), specimen 
age and nomenclatural type status. We are making these specimens available with a 
Creative Commons Zero licence waiver and with permanent online storage of the data. By 
doing this, we are minimising the obstacles to the use of these images for transcription 
training. This benchmark dataset of images may also be used where a defined and 
documented set of herbarium specimens is needed, such as for the extraction of 
morphological traits, handwriting recognition and colour analysis of specimens. 


Introduction 


Herbarium specimens are a research tool, an archive and a reference for plant sciences. 
They provide data and verifiability to disciplines such as phytogeography, taxonomy and 
ecology (Baird 2010). These physical specimens are divided between an estimated 3,000 
herbaria worldwide, which makes consultation of all virtually impossible. To facilitate 
access to the specimens, many herbaria are digitally imaging their collections and making 
these images available over the internet (Heerlien et al. 2015, Tulig et al. 2012). Although 
we have a long way to go before full digitisation of the world’s herbaria, there are already 
about 400 million digitised specimens and the number keeps growing (Thiers 2018). As of 
November 2018, there are more than 70 million preserved specimen records for plants in 
the Global Biodiversity Information Facility (GBIF.org 2018). 


As digital imaging of the world’s herbaria continues, there is a recognition that large 
amounts of information can be extracted from these images. This information includes data 
concerning the specimen's origin on the labels, such as location, date and collector, but 
also traits and the identity of the plant itself (Corney et al. 2012, MacGillivray et al. 2009, 
Schuettpelz et al. 2017, Kho et al. 2017). Methods to extract these data are still being 
developed and require training datasets and test images to validate their effectiveness 
(Carranza-Rojas et al. 2017). 


Herbarium specimens are far from homogeneous. They vary in the language, location and 
style of the labels, in whether they are typed or handwritten and in the quality and quantity 
of information on the labels (Mononen et al. 2014). Specimens are frequently annotated by 
more than one person and are stored by taxon, rather than by collector, both of which make 
handwriting recognition particularly difficult. A typical specimen will have text written on 
different dates and by different people in a mixture of printed, typed and handwritten scripts 
(Vollmar et al. 2010). 
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Not all herbarium digitisation projects are the same. They vary in aspects, such as the 
imaging methodology, the resolution of the digital image created and their approach to 
quality control (Nelson et al. 2012). Anyone building tools to analyse herbarium specimens 
needs to be aware of these variations and needs to account for them. The language used 
on the labels can also be problematic. Many collections have specimen labels written in a 
wide variety of languages, sometimes on the same specimen and one cannot assume the 
use of Latin script, even in Europe. The interpretation of certain symbols, such as those 
indicating nomenclatural type status, or of different labels, may not always be clear either. 


For all these reasons, we feel it is useful to provide a benchmark dataset of digitised 
herbarium specimens, made openly available for the development of tools and workflows 
for data extraction. This dataset has been placed in the public domain specifically to act as 
a test dataset for research and a benchmark to compare different methods. We have also 
provided transcribed data, where available, associated with each image, which can be 
used for comparison or for training systems. In addition, for 250 of the specimens, we have 
provided image overlays that identify the position of labels. These can be used for 
segmentation analysis of the specimen. 


The images have been released under a Creative Commons Zero licence waiver (https:// 
creativecommons.org/licenses), to ensure that there are no limitations that could hinder or 
discourage anyone from using them. However, the authors expect users to follow the 
norms of scientific citation. Each upload of images and data about a specimen has been 
assigned a DOI (Digital Object Identifier), which will uniquely and persistently identify it to 
allow citation. Data and media are provided as they were assembled right now and will not 
be kept in sync with new developments at the collection level after publication. Stable 
identifiers for the collection specimens themselves can always be found as ‘Alternative 
identifiers' at each upload's landing page (Groom et al. 2017, Guntsch et al. 2017). 


Sampling methods 


Study extent: Curators from nine European herbaria volunteered to provide a sample of 
their digitally imaged herbarium sheets. Herbarium curators were requested to select 
specimens following a set of guidelines that were chosen to ensure a representative cross- 
section of specimen characteristics. The aim was to provide specimens that could answer 
questions related to the language, condition, age and geography of the specimen and, at 
the same time, provide a sufficient sample size for statistical analysis (Table 1). Given the 
different origins and curatorial practices of different collections, not all institutions were 
capable of following these guidelines in full, particularly if the herbarium did not hold many 
type specimens (Table 2). For the remainder of this article, we will use the Index 
Herbariorum codes listed in that table to abrreviate institution names. 
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Table 1. 


The guidelines given to herbaria to select specimens for the test dataset. The goal was not to have 
a representative sample of all specimens, but to have comparable subsets, which will have labels 
written in different languages; will be printed or handwritten; will cover a wide range of dates; will be 
both type specimens and general collections and will provide specimens from different families and 
different parts of the world. 


Number of Type Date of Geography 
specimens status collection 
25 Type < 1970 Any country 
25 Type > 1970 Any country 
25 non- < 1970 From the country where the herbarium is located 
Type 
25 non- > 1970 From the country where the herbarium is located 
Type 
100 non- Any non-Type specimens from one other country or region of which the 
Type herbarium possesses a substantial number of specimens 
Table 2. 


Contributions of 9 different institutes to the dataset. Availability of JPG and TIFF images is 
indicated, as well as the source of label data. Most institutes were able to follow the template in 
Table 1. The regions picked for the 100 non-type specimens are indicated in the last column, as are 
deviations from the template in Table 1. Institution codes follow Index Herbariorum (http:// 
sweeitgum.nybg.org/science/ih/). The DOI of the collections is listed if GBIF was used as a data 
source. FinBIF is the Finnish Biodiversity Information Facility available at www.species.fi (Schulman 
et al. 2018). JACQ is a joint specimen data management system of over 30 European and Asian 
herbaria available at https://nerbarium.univie.ac.at/database/ (Rainer and Vitek 2009). 


Institute Institution Data Source Composition (with ISO 3166-1 
Code alpha-2 Country Codes) 

Meise Botanic Garden BR 10.15468/wrthhx As Table 1; 100 from AU, CA, NZ, US 

Royal Botanic Gardens, Kew K 10.15468/ly60bx As Table 1; 100 from BR 

Natural History Museum, London BM 10.5519/0002965 _ _— As Table 1; 100 from AU, CA, NZ, US 

Botanic Garden and Botanical B JACQ As Table 1; 100 from AU, BR, CN, ID, 

Museum, Berlin TZ, US 

Royal Botanic Garden Edinburgh E 10.15468/ypoair As Table 1; 100 from CN 

National Museum of Natural History, P 10.15468/nc6rxy 50 type, 50 non-Type FR, 100 non- 

Paris Type not FR 

Natural History Museum, University of TU 10.15156/ 100 < 1970, 100 > 1970 


Tartu bio/587444 
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Naturalis Biodiversity Center L 10.15468/ib5ypt As Table 1; 100 from ID; no selection 
on date 

Finnish Museum of Natural History H FinBIF As Table 1; 14 Fl, 36 ET instead of 50 

LUOMUS, University of Helsinki Fl; 100 from AU, BR, CN, ID, US 


Sampling description: Where possible, images were collected in JPG and lossless TIFF 
formats. Data were collated as a Darwin Core (DwC) Archive (Darwin Core Task Group, 
Biodiversity Information Standards (TDWG) 2009), if available from the Global Biodiversity 
Information Facility (GBIF) using their application programming interface (API) through rgbif 
in the R programming language (Chamberlain 2017, R Core Team 2017). Data were 
processed in the R language using the tidyverse (Wickham 2017) and rworldmap (South 
2011) packages. Scripts can be found in Suppl. materials 3, 4. 


Quality control: There are no clear indications in any data as to whether a specimen is 
completely or partially transcribed. The labels may contain a variety of information, so 
availability of the information in the data is not a good guide to whether it is present on the 
label. Nevertheless, of the 1,800 specimens, 94% had a collector listed and 56% had either 
a collector number or an explicit indication that there was no number. A total of 85% had 
either a verbatim or interpreted date and 90% had either a date or collector number. Hence, 
most of the specimens have some level of transcription. 


All specimens were analysed by a polyglot to determine the primary language of the label. 
In some cases the label had no dominant single language or no other text beyond the 
scientific Latin name of the specimens. However, the language of the label could be 
identified for 90% of the dataset (Fig. 1). English appears most frequently, with just over 
42%. Five other languages occur in more than 5% (i.e. 90) of specimens: French, Latin, 
Estonian, German and Dutch. Of these, only Estonian is linked to a single institution. This 
should make language-based analysis possible with this dataset. 


Step description: As detailed in Table 2, we compiled 200 images from each of nine 
institutes from seven different countries across Europe. All institutions provided JPEG 
format images and all but two could also provide TIFF format images; P and L were unable 
to provide TIFF images due to institution policy. Between institutions, the TIFF file size 
varied between 25 and 306 MB, with dimensions between 10 and 102 megapixels. 
Significant differences occurred within institutions too, due to different herbarium sheet 
sizes as well as different in-house scanning protocols. A total of 150 TIFFs of non-type 
specimens from H were horizontally photographed in two parts. To reconstruct an image of 
the whole specimen, these were stitched together in a semi-automated way using Adobe 
Photoshop CS4. No further image processing of JPEG or TIFF files was done. 
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A classification of the languages used on labels of the different soecimens. EN = English, FR 
= French, LA = Latin, ET = Estonian, DE = German, NL = Dutch, PT = Portuguese, ES = 
Spanish, SV = Swedish, RU = Russian, Fl = Finnish and IT = Italian. ZZ indicates a single 
language could not be determined: either there were multiple languages used on the label, 
there was no obvious use of a certain language (i.e. only scientific Latin terms) or the 
language was not readily identifiable. Different herbaria are identified by their Index 
Herbariorum codes (Institution Code in Table 2). 
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Figure 2. EE 


The distribution of collection dates (by year, if known) of the specimens in the dataset for each 
providing institution. The heat colour indicates the number of specimens for each 10 year time 
period. Year data were extracted from Darwin Core eventDate and verbatimEventDate if these 
were in ISO 8601 standard. Codes for the herbaria follow Table 2. 


A benchmark dataset of herbarium specimen images with label data 7 


Figure 3. EE 


A stacked pie chart generated using Krona (Ondov et al. 2015, hittos://github.com/marbl/ 
Krona/), depicting the taxonomic distribution by phylum, order and family (if known). Missing 


taxa were extracted from the GBIF backbone by family, if possible. For H, they were extracted 
by genus, as family was unavailable. An interactive version of this graph is available as an 
HTML file in the supplementary material. 


For seven of the institutions, the data associated with these images were downloaded 
using the GBIF API (accessed 2018-07-12). One of the other two (B) provided these data 
in DwC format directly. The other (H) had no method to export all data in DwC format, so 
data were extracted in JSON format using their API (https://api.laji.fi) (Accessed 
2018-07-09). These data were subsequently mapped to DwC in the R language using the 
package jsonlite (Ooms 2014 and see Suppl. material 2 for the mapping). Data for all 
specimens were fully joined in the R language. These data are available as Suppl. material 
5 as a Comma Separated Value (CSV) file as well as individual JSON-LD files for each 
image (script for this conversion in Suppl. material 6). Data are provided as they are 
available now and will not be kept up to date. However, users may be able to download up- 
to-date data from institutional repositories through the persistent identifiers of the 
specimens. 


For each of a subset of 250 specimens with labels in English, two PNG overlays were 
manually made in GIMP (GIMP Development Team 2017). These overlays indicate the 
location and class of labels, stamps, colour charts or other reference objects on the imaged 
herbarium specimen. One overlay indicates the location of each label with a different colour 
against a black background (indicated as "_all"). The other overlay indicates the class of 
labels using a colour code of white for barcode label, yellow for a colour chart and red for 
any other sort of label (indicated as "_ sel"). These overlays can be used to train algorithms 
to identify labels in order to facilitate data transcription. 
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Geographic coverage 


Description: Locations were mapped using their country code and decimal coordinates 
(Fig. 4). A total of 15% had decimal coordinate values and 94% had a country code. 
Specimens originate from all continents, except Antarctica. 


Latitude 


Longitude 


Figure 4. EE 


The location of geolocated specimens within the dataset and the number of specimens from 
each country. A total of 267 (15%) specimens have coordinates associated with them and 
1,695 (94%) are located to a country. Both categories may overlap. The map uses a Mollweide 
equal-area projection. 


Taxonomic coverage 


Description: The higher taxonomy of specimens was determined from the GBIF backbone 
taxonomy when those data came from GBIF. For the data that did not originate from GBIF, 
we matched the family (B specimens) or genus (H specimens) in the data to the backbone 
(GBIF Secretariat 2017). Only seven specimens could not be matched to the backbone. 
Two had no identification and the other five were homonyms at the genus level (e.g. 
Pellaea, which is both an animal and plant genus). More than 90% of specimens were, not 
unexpectedly, Tracheophyta, but within this phylum, there was a significant taxonomic 
coverage of 204 different families in 58 different orders (Fig. 3, Suppl. material 1). 


Although we aimed at incorporating 25% nomenclatural type specimens within the dataset, 
according to their data, only 19% are types. This lower value is because some collections 
are not created primarily for taxonomy and they therefore do not hold many types. Non-type 
specimens were selected as specimens without any type status. Hence, some specimens 
listed as non-types could actually be types, if they had not been identified as such in their 
digital publication. 
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Regarding the specimen collector names, there are 1,170 different names associated with 
the dataset. However, it is likely there are duplicates amongst those 1,170, as some names 
will not be exact textual matches. Only 6% of the specimens had no collector information. 


Temporal coverage 


Notes: A broad temporal coverage of the dataset was promoted by forcing a separation at 
1970 (Table 1). Year values were derived from the DwC terms eventDate or 
verbatimEventDate if it was in ISO 8601 format or otherwise standardised (Fig. 2). A year 
of collection could be identified for 82% of the specimens. 


Usage rights 


Use license: Creative Commons Public Domain Waiver (CC-Zero) 


Data resources 


Data package title: A benchmark dataset of herbarium specimen images with label data: 
Summary 


Resource link: https://zenodo.org/communities/icedigtest/ 


Alternative identifiers: 10.5281/zenodo.1492197 
Number of data sets: 1 


Data set name: Benchmark Dataset 


Description: This landing page contains a CSV file compiling all data associated with 
herbarium specimens that are part of this dataset, as they could be found on GBIF, 
JACQ or FinBIF. 


In addition, DOls of the individual soecimens uploaded to Zenodo and direct links to the 
different files (JPEG, TIFF, JSON, PNG) are also included. Index of these added 
variables: 


- persistentID: Persistent Identifier of the collection specimen. The persistent identifier 
is maintained by each institution and should always lead to the most up-to-date version 
of a digital specimen record. Apart from the persistent identifier, other data are liable to 
being amended in institutional databases. Data uploaded as part of this dataset will not 
be updated with changes at the collection's repository, but this persistent URI will 
always point to the up-to-date information in the institutional system. 


- joegURL, tiffURL, jsonURL: URLs pointing straight to the respective image and data 
files themselves, to facilitate (selective) batch downloads. 
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- pngSegAIIURL and pngSegSelURL: Segmented overlays of the herbarium specimens 
indicating the location of different labels and reference material on the sheet ("All") and 
their content ("Sel"). More information can be found in the paper (in prep.) associated 
with this data publication and the individual depositions themselves. 


- DOI: The DOI of the deposition of images and data of these specimens on Zenodo. 
DOls point to the most up-to-date version of these depositions at the time of the 
publication of this CSV file. As a rule, this CSV file will be updated should any changes 
happen to any of the depositions. 


- jpegURL2, tiffURL2: A few herbarium sheets had labels on the back and consisted 
therefore of two scans. As a rule, the label scans are in this category. 


Column label Column description 


Data and links.csv Supplementary Info 5 


Additional information 


As an increasing number of herbarium specimens are digitally imaged, the possibility of 
automated analysis becomes more attractive. However, simply providing access to the 
digital images does not enable full use of the resource. The data associated with the image 
also need to be accessible for most analyses and this requires these data to be digitised, 
categorised and standardised (Scoble 2010). 


The digitisation of label data is one of the most significant bottlenecks to the full digitisation 
of herbaria (Barber et al. 2013). Digital image capture is only one step towards full 
digitisation. For this reason, many groups are working on ways to improve and simplify the 
process of label data capture (Hill et al. 2012, Haston et al. 2015, the ICEDIG Project: htt 
p://icedig.eu). Currently, the main method being used is human transcription, either using 
professional transcribers or volunteers. Professionals may be herbarium employees or 
outsourcing companies and they may work on bespoke IT systems or online. Volunteers 
are often recruited online through citizen science portals, such as Notes from Nature (htto 
s://www.notesfromnature.org), Les Herbonautes (http://lesherbonautes.mnhn.fr), DoeDat (h 
ttps:/Awww.doedat.be) and DigiVol (https://digivol.ala.org.au/). These different methods and 
platforms vary considerably in their approaches to quality control and completeness of 
transcription. There are many unresolved questions about the success of these different 
approaches, the quality of the data they generate and their cost-effectiveness (Ellwood et 
al. 2015). Such questions of quality can be addressed with a benchmark dataset of 
images, such as the one described in this paper. Trials involving this dataset on the 
transcription platforms mentioned above are already underway and some had already 
finished at the time of this article's submission. A publication including a comparatory 
analysis is planned. 
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Another approach to data extraction is automation. This might involve optical character 
recognition of text or other forms of pattern recognition (Drinkwater et al. 2014). Various 
research groups have had some success in this (Haston et al. 2015). Yet, questions remain 
on the quality of the data output, including post-processing to classify and standardise the 
data obtained and how this compares to human transcription. Again, benchmark datasets 
are required to provide a comparison of techniques. Though, even this diverse set of 
specimens will not provide training data for every possible need, particularly considering 
the wide range of languages used on specimens. The subset of 250 image overlays 
indicating the location and nature of the sheet labels can be used to evaluate the impact of 
segmenting out the labels before automated data extraction. It can also serve to train 
algorithms designed to automate label recognition. 


Digital images of herbarium specimens may also be used for other purposes, for example, 
to extract trait data from plants or to identify the species in question (Cope et al. 2012, 
Carranza-Rojas et al. 2017). However, techniques and software need to be developed to 
industrialise this to the scale required. It can be expected that techniques of artificial 
intelligence / machine learning (ML) might be applied for this and the dataset to be used for 
training and proving purposes (Waldchen et al. 2018). 


Some analysis techniques may only be suitable for certain types of specimen, for example, 
when ML algorithms are trained only in one language or the handwriting of one collector. 
Here, we have provided a wide variety of test images from which subsamples can be 
selected for different purposes. However, in selecting the images, we have not attempted to 
provide a random subsample of specimens, but have tried to provide a good cross-section 
of the different kinds. This means that some countries, languages and scripts are not 
represented at all in the collection and the collection will be biased geographically and 
taxonomically. However, for those countries and languages represented in the set, there 
will be multiple specimens. 


The whole dataset has been archived to the Zenodo research data repository (hitps:// 
zenodo.org, Suppl. material 7), where each specimen has its own digital object identifier 
(DOI). This DOI resolves to a landing page on Zenodo, which contains the specimen’s 
currently available data as a JSON-LD file and the scanned image in a compressed JPEG 
format. If available, a lossless TIFF version of the image and two overlay PNG versions can 
be found there as well. Most data values have also been incorporated into the Zenodo 
database to improve findability. They are encoded in the "Subjects" fields, combined with 
persistent identifiers for the Darwin Core and Dublin Core terminology. A landing page with 
its own DOI for the dataset as a whole contains a CSV file that comprises all available 
metadata for each specimen and links to the JPEG, PNG and TIFF files. This overarching 
file should make it easy to download parts of the dataset, such as JPEGs only, TIFFs only 
or even English specimens only, with simple batch download scripts. The dataset can be 
viewed at https://zenodo.org/communities/icedigtest. The landing page is available here 
and its CSV file can also be found in the Supplementary Info (Suppl. material 5). 
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Supplementary materials 


Suppl. material 1: Taxonomic coverage (interactive HTML file) EE 


Authors: Mathias Dillen 

Data type: Interactive chart 

Brief description: Interactive version of the taxonomic coverage chart, Figure 2 in the article. 
Rendered using Krona (https://github.com/marbl/Krona). 

Filename: krona final.html - Download file (254.10 kb) 


Suppl. material 2: R script used to map data from FinBIF API to DwC EE) 


Authors: Mathias Dillen 

Data type: R script 

Brief description: This R script was used to obtain metadata for the specimens from H in 
Darwin Core format, using the FinBIF API. Certain transformations depend on what was present in 
this specific dataset and might not be generically applicable. 

Filename: helsinki mapping.R - Download file (15.90 kb) 


Suppl. material 3: R script CSV files EE 


Authors: Mathias Dillen 

Data type: Zipped CSV files 

Brief description: This ZIP contains the CSV files necessary for the R script which retrieved and 
joined the metadata of the dataset and produced most of the graphs. 

In addition to seven files with 200 barcodes each for BR, BM, E, K, L, P and TU and two files 
containing all metadata for B and H, it also contains a file listing the label language for each 
specimen, a summary table for languages in the dataset and a file mapping DwC terms to their 
overarching categories. 
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Filename: raw data files.zip - Download file (71.51 kb) 


Suppl. material 4: R script used for this paper EE 


Authors: Mathias Dillen 

Data type: R script 

Brief description: This R script file contains the different scripts used to obtain metadata, join it, 
export it and produce the paper's graphs (except for the taxonomic graph, which was done using 
data exported from R into the Krona Excel macro template, which can be found on Github). The 
CSV files needed for this script are in a separate ZIP file. 

Filename: paperscript.R - Download file (16.02 kb) 


Suppl. material 5: Table of specimen data, DOls and URIs EE) 


Authors: Mathias Dillen 

Data type: CSV 

Brief description: This file contains data of the 1800 digitised specimens this paper's dataset is 
composed of. The joined data originate from different sources as described above and have also 


been filtered for a few repository-specific variables, such as GBIF taxon keys. DwC extensions are 
encoded in JSON. 

This file also contains a list of DOls and Zenodo file URIs (jpegURI, tiffURI...) for the images of 
each specimen this dataset consists of. Using these links and DOls, it should be easy to retrieve 
and cite any proportion of this dataset as needed. 

Filename: Data and links.csv - Download file (2.31 MB) 


Suppl. material 6: R Script to compile JSON files from CSV E&I 


Authors: Mathias Dillen 

Data type: R script 

Brief description: This R script was used to convert data in a CSV format to single JSON-LD 
files. The ZIP file also contains the original CSV file. 

Filename: constructing json files final.zip - Download file (227.51 kb) 


Suppl. material 7: Python script to upload the dataset to Zenodo [Ei] 


Authors: Mathias Dillen 
Data type: Python Script 


Brief description: This Python script was used to upload the dataset to the Zenodo platform 
through their API. 


Filename: final script for paper.py - Download file (8.65 kb) 


